AITopics | keystep recognition

Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state---such as the steps of a recipe or the steps of a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a particular sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional video, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.

keystep recognition, name change, video-mined task graph, (2 more...)

Neural Information Processing Systems

Genre: Instructional Material > Course Syllabus & Notes (0.32)

Industry:

Education > Educational Technology > Media (0.68)
Education > Educational Technology > Audio & Video (0.68)

Technology: Information Technology > Artificial Intelligence (0.79)

Add feedback

d62e65cfdba247e0cd7cac5964f9fbd9-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 08:40:15 GMT

artificial intelligence, keystep, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

Neural Information Processing SystemsOct-9-2025, 08:40:11 GMT

Instructional "how-to" videos online allow users to master new skills and everyday DIY tasks, from

artificial intelligence, machine learning, natural language, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > Promising Solution (0.68)
Instructional Material > Course Syllabus & Notes (0.43)

Industry:

Education > Educational Technology > Audio & Video (0.53)
Education > Educational Technology > Media (0.43)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

Neural Information Processing SystemsJan-19-2025, 23:38:19 GMT

Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state---such as the steps of a recipe or the steps of a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a particular sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional video, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.

instructional video, keystep recognition, video-mined task graph

Neural Information Processing Systems

Genre: Instructional Material > Course Syllabus & Notes (0.66)

Industry: